Agent Learning in Relational Domains based on Logical MDPs with Negation

نویسندگان

  • Song Zhiwei
  • Chen Xiaoping
  • Cong Shuang
چکیده

In this paper, we propose a model named Logical Markov Decision Processes with Negation for Relational Reinforcement Learning for applying Reinforcement Learning algorithms on the relational domains with the states and actions in relational form. In the model, the logical negation is represented explicitly, so that the abstract state space can be constructed from the goal state(s) of a given task simply by applying a generating method and an expanding method, and each ground state can be represented by one and only one abstract state. Prototype action is also introduced into the model, so that the applicable abstract actions can be obtained automatically. Based on the model, a model-free Θ(λ)-learning algorithm is implemented to evaluate the state-action-substitution value function. We also propose a state refinement method guided by two formal definitions of self-loop degree and common characteristic of abstract states to construct the abstract state space automatically by the agent itself rather than manually. The experiments show that the agent can catch the core of the given task, and the final state space is intuitive.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building Relational World Models for Reinforcement Learning

Many reinforcement learning domains are highly relational. While traditional temporal-difference methods can be applied to these domains, they are limited in their capacity to exploit the relational nature of the domain. Our algorithm, AMBIL, constructs relational world models in the form of relational Markov decision processes (MDPs). AMBIL works backwards from collections of high-reward state...

متن کامل

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case ...

متن کامل

Bellman goes Relational ( extended abstract ) 1

We introduce ReBel, a relational Bellman update operator that can be used for Markov Decision Processes in – possibly infinite – relational domains. Using ReBel we develop a relational value iteration algorithm. 1 Relational Markov Decision Processes Many reinforcement learning (RL) and dynamic programming techniques have been developed for solving Markov Decision Processes (MDP). Until recentl...

متن کامل

Logical Markov Decision Programs and the Convergence of Logical TD(lambda)

Recent developments in the area of relational reinforcement learning (RRL) have resulted in a number of new algorithms. A theory, however, that explains why RRL works, seems to be lacking. In this paper, we provide some initial results on a theory of RRL. To realize this, we introduce a novel representation formalism, called logical Markov decision programs (LOMDPs), that integrates Markov Deci...

متن کامل

Extending the Qualitative Trajectory Calculus Based on the Concept of Accessibility of Moving Objects in the Paths

Qualitative spatial representation and reasoning are among the important capabilities in intelligent geospatial information system development. Although a large contribution to the study of moving objects has been attributed to the quantitative use and analysis of data, such calculations are ineffective when there is little inaccurate data on position and geometry or when explicitly explaining ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JCP

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2008